The primary objective of this project is to predict hourly solar power prediction for the Edikli GES (Güneş Enerjisi Santrali) solar power plant located in Niğde. The forecasting period spans from May 14 to June 4, with 24-hour predictions generated for each day. The forecasting model utilizes production data up to two days before the target date, ensuring that the data is refreshed daily within this time frame to enhance prediction accuracy.
The data utilized in this project comprises two main components: weather data and solar power production data. The weather data includes variables such as downward shortwave radiation flux (dswrf_surface), cloud cover at various atmospheric levels (tcdc_low.cloud.layer, tcdc_middle.cloud.layer, tcdc_high.cloud.layer), and temperature at the surface (tmp_surface). This weather data is recorded hourly and provides crucial information on the environmental conditions affecting solar power production.
The solar power production data includes the hourly production values recorded at the Edikli GES plant. Both datasets are merged using a common datetime index, ensuring that each production record is associated with the corresponding weather conditions. This combined dataset is essential for developing accurate predictive models, as it allows for the analysis of how weather variables influence solar power output.
Our approach involves using a combination of weather variables and historical production data to build predictive models. We start with data preprocessing to clean and organize the data, followed by exploratory data analysis to identify key patterns and relationships. We then build several linear regression models, gradually adding more variables to improve the accuracy of our predictions.
require(data.table)
## Loading required package: data.table
require(lubridate)
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
require(forecast)
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
require(skimr)
## Loading required package: skimr
require(repr)
## Loading required package: repr
require(openxlsx) #library(openxlsx)
## Loading required package: openxlsx
require(ggplot2)
## Loading required package: ggplot2
require(data.table)
require(skimr)
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
require(ggcorrplot)
## Loading required package: ggcorrplot
require(forecast)
library(data.table)
library(lubridate)
library(forecast)
library(skimr)
library(repr)
library(readxl)
These libraries are essential for data manipulation, time series analysis, visualization, and handling date-time operations.
todays_date=Sys.Date()
forecast_date=todays_date+1
options(repr.plot.width=12.7, repr.plot.height=8.5)
data_path2='/Users/kesici/Downloads/processed_weather.csv'
weather_info=fread(data_path2)
weather_info[,datetime:=ymd(date)+dhours(hour)]
weather_info=weather_info[order(datetime)]
head(weather_info,25)
## date hour lat lon dswrf_surface tcdc_low.cloud.layer
## <IDat> <int> <num> <num> <num> <num>
## 1: 2022-01-01 4 38.00 35.00 0 0.2
## 2: 2022-01-01 4 38.50 35.25 0 1.6
## 3: 2022-01-01 4 37.75 34.75 0 4.4
## 4: 2022-01-01 4 38.75 34.50 0 5.0
## 5: 2022-01-01 4 37.75 34.50 0 0.0
## 6: 2022-01-01 4 38.25 34.75 0 0.0
## 7: 2022-01-01 4 38.75 35.00 0 5.0
## 8: 2022-01-01 4 38.50 35.00 0 1.7
## 9: 2022-01-01 4 38.25 34.50 0 5.0
## 10: 2022-01-01 4 37.75 35.00 0 1.7
## 11: 2022-01-01 4 38.00 34.50 0 0.0
## 12: 2022-01-01 4 38.50 34.50 0 2.9
## 13: 2022-01-01 4 37.75 35.25 0 1.0
## 14: 2022-01-01 4 37.75 35.50 0 3.5
## 15: 2022-01-01 4 38.00 34.75 0 3.0
## 16: 2022-01-01 4 38.75 35.50 0 1.4
## 17: 2022-01-01 4 38.25 35.00 0 0.0
## 18: 2022-01-01 4 38.25 35.50 0 5.0
## 19: 2022-01-01 4 38.75 35.25 0 2.1
## 20: 2022-01-01 4 38.00 35.25 0 0.0
## 21: 2022-01-01 4 38.25 35.25 0 4.0
## 22: 2022-01-01 4 38.50 35.50 0 0.0
## 23: 2022-01-01 4 38.00 35.50 0 4.1
## 24: 2022-01-01 4 38.75 34.75 0 5.0
## 25: 2022-01-01 4 38.50 34.75 0 3.0
## date hour lat lon dswrf_surface tcdc_low.cloud.layer
## tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
## <num> <num> <num>
## 1: 5.0 2.1 8.2
## 2: 0.0 1.6 3.3
## 3: 21.8 6.9 32.7
## 4: 0.0 5.0 14.7
## 5: 36.1 5.8 41.4
## 6: 0.0 7.5 9.1
## 7: 0.9 9.7 18.3
## 8: 0.0 5.0 8.8
## 9: 0.0 5.0 13.2
## 10: 25.1 5.0 32.3
## 11: 5.0 7.6 14.0
## 12: 0.0 5.0 12.5
## 13: 13.9 5.0 21.7
## 14: 19.0 5.0 28.0
## 15: 5.0 7.2 15.1
## 16: 0.1 5.0 6.7
## 17: 0.0 1.7 1.7
## 18: 0.0 0.0 5.2
## 19: 1.4 5.9 10.6
## 20: 5.1 4.1 10.6
## 21: 0.0 0.0 4.0
## 22: 0.0 0.0 0.0
## 23: 9.5 5.0 18.2
## 24: 0.7 5.0 15.7
## 25: 0.0 5.0 11.4
## tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
## uswrf_top_of_atmosphere csnow_surface dlwrf_surface uswrf_surface
## <num> <int> <num> <num>
## 1: 0 0 219.279 0
## 2: 0 0 227.479 0
## 3: 0 0 227.179 0
## 4: 0 0 241.779 0
## 5: 0 0 241.879 0
## 6: 0 0 230.579 0
## 7: 0 0 236.379 0
## 8: 0 0 228.379 0
## 9: 0 0 228.079 0
## 10: 0 0 217.179 0
## 11: 0 0 226.179 0
## 12: 0 0 234.379 0
## 13: 0 0 214.779 0
## 14: 0 0 235.679 0
## 15: 0 0 225.079 0
## 16: 0 0 232.479 0
## 17: 0 0 230.679 0
## 18: 0 0 222.979 0
## 19: 0 0 234.279 0
## 20: 0 0 209.479 0
## 21: 0 0 232.879 0
## 22: 0 0 211.779 0
## 23: 0 0 221.879 0
## 24: 0 0 239.679 0
## 25: 0 0 229.579 0
## uswrf_top_of_atmosphere csnow_surface dlwrf_surface uswrf_surface
## tmp_surface datetime
## <num> <POSc>
## 1: 268.804 2022-01-01 04:00:00
## 2: 271.204 2022-01-01 04:00:00
## 3: 268.304 2022-01-01 04:00:00
## 4: 271.404 2022-01-01 04:00:00
## 5: 272.504 2022-01-01 04:00:00
## 6: 271.204 2022-01-01 04:00:00
## 7: 270.904 2022-01-01 04:00:00
## 8: 270.504 2022-01-01 04:00:00
## 9: 270.104 2022-01-01 04:00:00
## 10: 265.604 2022-01-01 04:00:00
## 11: 268.204 2022-01-01 04:00:00
## 12: 271.404 2022-01-01 04:00:00
## 13: 264.704 2022-01-01 04:00:00
## 14: 269.304 2022-01-01 04:00:00
## 15: 269.004 2022-01-01 04:00:00
## 16: 271.204 2022-01-01 04:00:00
## 17: 271.204 2022-01-01 04:00:00
## 18: 268.304 2022-01-01 04:00:00
## 19: 270.304 2022-01-01 04:00:00
## 20: 265.004 2022-01-01 04:00:00
## 21: 271.304 2022-01-01 04:00:00
## 22: 262.204 2022-01-01 04:00:00
## 23: 265.604 2022-01-01 04:00:00
## 24: 271.404 2022-01-01 04:00:00
## 25: 270.804 2022-01-01 04:00:00
## tmp_surface datetime
data_path='/Users/kesici/Downloads/production 2.csv'
production=fread(data_path)
production[,datetime:=ymd(date)+dhours(hour)]
production=production[order(datetime)]
head(production,25)
## date hour production datetime
## <IDat> <int> <num> <POSc>
## 1: 2022-01-01 0 0.00 2022-01-01 00:00:00
## 2: 2022-01-01 1 0.00 2022-01-01 01:00:00
## 3: 2022-01-01 2 0.00 2022-01-01 02:00:00
## 4: 2022-01-01 3 0.00 2022-01-01 03:00:00
## 5: 2022-01-01 4 0.00 2022-01-01 04:00:00
## 6: 2022-01-01 5 0.00 2022-01-01 05:00:00
## 7: 2022-01-01 6 0.00 2022-01-01 06:00:00
## 8: 2022-01-01 7 0.00 2022-01-01 07:00:00
## 9: 2022-01-01 8 3.40 2022-01-01 08:00:00
## 10: 2022-01-01 9 6.80 2022-01-01 09:00:00
## 11: 2022-01-01 10 9.38 2022-01-01 10:00:00
## 12: 2022-01-01 11 7.65 2022-01-01 11:00:00
## 13: 2022-01-01 12 6.80 2022-01-01 12:00:00
## 14: 2022-01-01 13 5.10 2022-01-01 13:00:00
## 15: 2022-01-01 14 5.10 2022-01-01 14:00:00
## 16: 2022-01-01 15 1.70 2022-01-01 15:00:00
## 17: 2022-01-01 16 0.00 2022-01-01 16:00:00
## 18: 2022-01-01 17 0.00 2022-01-01 17:00:00
## 19: 2022-01-01 18 0.00 2022-01-01 18:00:00
## 20: 2022-01-01 19 0.00 2022-01-01 19:00:00
## 21: 2022-01-01 20 0.00 2022-01-01 20:00:00
## 22: 2022-01-01 21 0.00 2022-01-01 21:00:00
## 23: 2022-01-01 22 0.00 2022-01-01 22:00:00
## 24: 2022-01-01 23 0.00 2022-01-01 23:00:00
## 25: 2022-01-02 0 0.00 2022-01-02 00:00:00
## date hour production datetime
str(production)
## Classes 'data.table' and 'data.frame': 21000 obs. of 4 variables:
## $ date : IDate, format: "2022-01-01" "2022-01-01" ...
## $ hour : int 0 1 2 3 4 5 6 7 8 9 ...
## $ production: num 0 0 0 0 0 0 0 0 3.4 6.8 ...
## $ datetime : POSIXct, format: "2022-01-01 00:00:00" "2022-01-01 01:00:00" ...
## - attr(*, ".internal.selfref")=<externalptr>
After loading the required libraries, the code sets the current date and the forecast date for generating predictions. It adjusts the plot dimensions for better visualization. The weather data is read from a CSV file, and a new datetime column is created by combining the date and hour columns. This data is then sorted by the datetime column. Similarly, the solar power production data is read from another CSV file, a datetime column is created, and the data is sorted accordingly. Displaying the first few rows of both datasets and examining their structure ensures that the data is correctly formatted and ready for further analysis.
hourly_series=weather_info[,list(dswrf_surface=sum(dswrf_surface)/25,tcdc_low.cloud.layer=sum(tcdc_low.cloud.layer)/25,tcdc_middle.cloud.layer=sum(tcdc_middle.cloud.layer)/25,tcdc_high.cloud.layer=sum(tcdc_high.cloud.layer)/25,tcdc_entire.atmosphere=sum(tcdc_entire.atmosphere)/25,uswrf_top_of_atmosphere=sum(uswrf_top_of_atmosphere)/25,csnow_surface=sum(csnow_surface)/25,dlwrf_surface=sum(dlwrf_surface)/25,swrf_surface=sum(uswrf_surface)/25,tmp_surface=sum(tmp_surface)/25),list(date,hour)]
hourly_series[,datetime:=ymd(date)+dhours(hour)]
head(hourly_series)
## date hour dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
## <IDat> <int> <num> <num> <num>
## 1: 2022-01-01 4 0.0000 2.384 5.944
## 2: 2022-01-01 5 0.0000 2.784 4.324
## 3: 2022-01-01 6 0.0000 2.964 5.372
## 4: 2022-01-01 7 0.0000 3.284 9.212
## 5: 2022-01-01 8 0.0000 3.672 11.252
## 6: 2022-01-01 9 7.3688 4.120 10.880
## tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
## <num> <num> <num>
## 1: 4.604 14.296 0.00000
## 2: 10.636 19.272 0.00000
## 3: 11.688 21.772 0.00000
## 4: 20.736 31.992 0.00000
## 5: 26.432 38.376 0.00000
## 6: 35.088 45.856 8.96704
## csnow_surface dlwrf_surface swrf_surface tmp_surface datetime
## <num> <num> <num> <num> <POSc>
## 1: 0 227.999 0.0000 269.220 2022-01-01 04:00:00
## 2: 0 227.774 0.0000 269.104 2022-01-01 05:00:00
## 3: 0 227.764 0.0000 269.035 2022-01-01 06:00:00
## 4: 0 228.196 0.0000 269.001 2022-01-01 07:00:00
## 5: 0 228.657 0.0000 269.002 2022-01-01 08:00:00
## 6: 0 229.416 2.4128 271.634 2022-01-01 09:00:00
The provided code aggregates the weather data to create an hourly summary by averaging values over 25 grid points.
mergeddata<-merge(hourly_series,production,by="datetime",all.x=T)
head(mergeddata)
## Key: <datetime>
## datetime date.x hour.x dswrf_surface tcdc_low.cloud.layer
## <POSc> <IDat> <int> <num> <num>
## 1: 2022-01-01 04:00:00 2022-01-01 4 0.0000 2.384
## 2: 2022-01-01 05:00:00 2022-01-01 5 0.0000 2.784
## 3: 2022-01-01 06:00:00 2022-01-01 6 0.0000 2.964
## 4: 2022-01-01 07:00:00 2022-01-01 7 0.0000 3.284
## 5: 2022-01-01 08:00:00 2022-01-01 8 0.0000 3.672
## 6: 2022-01-01 09:00:00 2022-01-01 9 7.3688 4.120
## tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
## <num> <num> <num>
## 1: 5.944 4.604 14.296
## 2: 4.324 10.636 19.272
## 3: 5.372 11.688 21.772
## 4: 9.212 20.736 31.992
## 5: 11.252 26.432 38.376
## 6: 10.880 35.088 45.856
## uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface tmp_surface
## <num> <num> <num> <num> <num>
## 1: 0.00000 0 227.999 0.0000 269.220
## 2: 0.00000 0 227.774 0.0000 269.104
## 3: 0.00000 0 227.764 0.0000 269.035
## 4: 0.00000 0 228.196 0.0000 269.001
## 5: 0.00000 0 228.657 0.0000 269.002
## 6: 8.96704 0 229.416 2.4128 271.634
## date.y hour.y production
## <IDat> <int> <num>
## 1: 2022-01-01 4 0.0
## 2: 2022-01-01 5 0.0
## 3: 2022-01-01 6 0.0
## 4: 2022-01-01 7 0.0
## 5: 2022-01-01 8 3.4
## 6: 2022-01-01 9 6.8
newdata=mergeddata
newdata=newdata[,-c("date.y")]
newdata=newdata[,-c("hour.y")]
basedata=newdata[,-c("date.x")]
basedata=basedata[,-c("hour.x")]
basedata=basedata[,-c("datetime")]
head(newdata)
## Key: <datetime>
## datetime date.x hour.x dswrf_surface tcdc_low.cloud.layer
## <POSc> <IDat> <int> <num> <num>
## 1: 2022-01-01 04:00:00 2022-01-01 4 0.0000 2.384
## 2: 2022-01-01 05:00:00 2022-01-01 5 0.0000 2.784
## 3: 2022-01-01 06:00:00 2022-01-01 6 0.0000 2.964
## 4: 2022-01-01 07:00:00 2022-01-01 7 0.0000 3.284
## 5: 2022-01-01 08:00:00 2022-01-01 8 0.0000 3.672
## 6: 2022-01-01 09:00:00 2022-01-01 9 7.3688 4.120
## tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
## <num> <num> <num>
## 1: 5.944 4.604 14.296
## 2: 4.324 10.636 19.272
## 3: 5.372 11.688 21.772
## 4: 9.212 20.736 31.992
## 5: 11.252 26.432 38.376
## 6: 10.880 35.088 45.856
## uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface tmp_surface
## <num> <num> <num> <num> <num>
## 1: 0.00000 0 227.999 0.0000 269.220
## 2: 0.00000 0 227.774 0.0000 269.104
## 3: 0.00000 0 227.764 0.0000 269.035
## 4: 0.00000 0 228.196 0.0000 269.001
## 5: 0.00000 0 228.657 0.0000 269.002
## 6: 8.96704 0 229.416 2.4128 271.634
## production
## <num>
## 1: 0.0
## 2: 0.0
## 3: 0.0
## 4: 0.0
## 5: 3.4
## 6: 6.8
head(basedata)
## dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
## <num> <num> <num>
## 1: 0.0000 2.384 5.944
## 2: 0.0000 2.784 4.324
## 3: 0.0000 2.964 5.372
## 4: 0.0000 3.284 9.212
## 5: 0.0000 3.672 11.252
## 6: 7.3688 4.120 10.880
## tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
## <num> <num> <num>
## 1: 4.604 14.296 0.00000
## 2: 10.636 19.272 0.00000
## 3: 11.688 21.772 0.00000
## 4: 20.736 31.992 0.00000
## 5: 26.432 38.376 0.00000
## 6: 35.088 45.856 8.96704
## csnow_surface dlwrf_surface swrf_surface tmp_surface production
## <num> <num> <num> <num> <num>
## 1: 0 227.999 0.0000 269.220 0.0
## 2: 0 227.774 0.0000 269.104 0.0
## 3: 0 227.764 0.0000 269.035 0.0
## 4: 0 228.196 0.0000 269.001 0.0
## 5: 0 228.657 0.0000 269.002 3.4
## 6: 0 229.416 2.4128 271.634 6.8
The provided code merges the aggregated weather data with the production data to create a comprehensive dataset for analysis.
Next, unnecessary columns resulting from the merge are removed such as hour and date. Then the first few rows of ‘newdata’ and ‘basedata’ are displayed to confirm the column removals and the final structure of the datasets
ggpairs(basedata)
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 120 rows containing missing values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 120 rows containing missing values
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
## Removed 5 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 6 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 6 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 122 rows containing missing values
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 6 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 121 rows containing missing values
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 6 rows containing missing values (`geom_point()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 121 rows containing missing values
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 120 rows containing missing values
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 119 rows containing missing values (`geom_point()`).
## Warning: Removed 120 rows containing missing values (`geom_point()`).
## Removed 120 rows containing missing values (`geom_point()`).
## Warning: Removed 122 rows containing missing values (`geom_point()`).
## Warning: Removed 121 rows containing missing values (`geom_point()`).
## Removed 121 rows containing missing values (`geom_point()`).
## Warning: Removed 119 rows containing missing values (`geom_point()`).
## Removed 119 rows containing missing values (`geom_point()`).
## Warning: Removed 120 rows containing missing values (`geom_point()`).
## Warning: Removed 119 rows containing missing values (`geom_point()`).
## Warning: Removed 118 rows containing non-finite values (`stat_density()`).
basedata=basedata[,-c("csnow_surface")]
basedata=basedata[,-c("hour")]
## Warning: column(s) not removed because not found: [hour]
basedata=basedata[,-c("datetime")]
## Warning: column(s) not removed because not found: [datetime]
head(basedata,25)
## dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
## <num> <num> <num>
## 1: 0.0000 2.384 5.944
## 2: 0.0000 2.784 4.324
## 3: 0.0000 2.964 5.372
## 4: 0.0000 3.284 9.212
## 5: 0.0000 3.672 11.252
## 6: 7.3688 4.120 10.880
## 7: 180.1384 4.180 13.748
## 8: 254.7928 3.972 16.520
## 9: 312.5520 4.408 23.468
## 10: 347.4144 5.348 32.996
## 11: 364.6544 6.772 39.460
## 12: 362.2984 9.116 43.452
## 13: 221.4584 37.100 82.296
## 14: 157.6448 37.032 78.252
## 15: 107.3080 35.876 74.752
## 16: 80.4792 35.080 72.480
## 17: 64.3864 36.096 75.236
## 18: 53.6528 37.028 77.632
## 19: 0.0000 49.732 90.688
## 20: 0.0000 56.532 92.580
## 21: 0.0000 66.812 94.140
## 22: 0.0000 71.096 94.212
## 23: 0.0000 74.580 94.000
## 24: 0.0000 77.660 94.556
## 25: 0.0000 93.816 91.588
## dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
## tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
## <num> <num> <num>
## 1: 4.604 14.296 0.00000
## 2: 10.636 19.272 0.00000
## 3: 11.688 21.772 0.00000
## 4: 20.736 31.992 0.00000
## 5: 26.432 38.376 0.00000
## 6: 35.088 45.856 8.96704
## 7: 80.764 85.364 135.20448
## 8: 69.392 75.468 152.98944
## 9: 60.516 70.368 167.55392
## 10: 54.852 70.452 180.69312
## 11: 45.880 71.136 187.03872
## 12: 39.864 71.612 188.88640
## 13: 12.680 89.116 168.56448
## 14: 17.376 85.688 133.18080
## 15: 27.016 83.300 92.25216
## 16: 37.268 83.508 69.18720
## 17: 44.544 85.644 55.34976
## 18: 49.756 87.520 46.12352
## 19: 68.948 98.276 0.00000
## 20: 69.140 98.380 0.00000
## 21: 76.844 98.760 0.00000
## 22: 80.404 98.492 0.00000
## 23: 84.192 98.792 0.00000
## 24: 86.644 98.980 0.00000
## 25: 99.228 100.000 0.00000
## tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
## dlwrf_surface swrf_surface tmp_surface production
## <num> <num> <num> <num>
## 1: 227.999 0.00000 269.220 0.00
## 2: 227.774 0.00000 269.104 0.00
## 3: 227.764 0.00000 269.035 0.00
## 4: 228.196 0.00000 269.001 0.00
## 5: 228.657 0.00000 269.002 3.40
## 6: 229.416 2.41280 271.634 6.80
## 7: 237.291 59.18848 275.786 9.38
## 8: 237.387 82.53120 278.553 7.65
## 9: 238.387 99.73056 280.215 6.80
## 10: 240.887 109.43168 280.609 5.10
## 11: 243.015 113.51040 280.277 5.10
## 12: 245.296 112.10304 279.165 1.70
## 13: 260.756 66.06976 277.059 0.00
## 14: 258.960 47.22432 273.467 0.00
## 15: 257.256 32.11968 271.659 0.00
## 16: 256.506 24.08960 271.537 0.00
## 17: 257.705 19.27104 271.715 0.00
## 18: 259.485 16.06016 271.855 0.00
## 19: 272.875 0.00000 272.029 0.00
## 20: 274.722 0.00000 272.231 0.00
## 21: 280.774 0.00000 272.513 0.00
## 22: 283.754 0.00000 272.616 0.00
## 23: 287.346 0.00000 273.024 0.00
## 24: 291.060 0.00000 273.426 0.00
## 25: 312.804 0.00000 273.605 0.00
## dlwrf_surface swrf_surface tmp_surface production
corr<-round(cor(basedata),1)
The provided code conducts exploratory data analysis by visualizing pairwise relationships and computing correlations between variables in the dataset.
Next, unnecessary columns are removed from basedata such as csnow_surface, hour and datetime because they are not needed for the correlation analysis
Finally, we calculate the correlation matrix for the remaining variables in ‘basedata’.
daily_series=newdata[,list(total=sum(production)),by=list(date.x)]
ggplot(daily_series, aes(date.x,total, group=1)) + geom_line() +geom_point()
## Warning: Removed 5 rows containing missing values (`geom_line()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
a=newdata[!is.na(production)]
acf(a$production)
newdata
## Key: <datetime>
## datetime date.x hour.x dswrf_surface tcdc_low.cloud.layer
## <POSc> <IDat> <int> <num> <num>
## 1: 2022-01-01 04:00:00 2022-01-01 4 0.0000 2.384
## 2: 2022-01-01 05:00:00 2022-01-01 5 0.0000 2.784
## 3: 2022-01-01 06:00:00 2022-01-01 6 0.0000 2.964
## 4: 2022-01-01 07:00:00 2022-01-01 7 0.0000 3.284
## 5: 2022-01-01 08:00:00 2022-01-01 8 0.0000 3.672
## ---
## 21110: 2024-05-29 17:00:00 2024-05-29 17 557.5424 12.980
## 21111: 2024-05-29 18:00:00 2024-05-29 18 475.3048 12.872
## 21112: 2024-05-29 19:00:00 2024-05-29 19 394.9286 11.532
## 21113: 2024-05-29 20:00:00 2024-05-29 20 321.1014 10.816
## 21114: 2024-05-29 21:00:00 2024-05-29 21 267.5859 11.804
## tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
## <num> <num> <num>
## 1: 5.944 4.604 14.296
## 2: 4.324 10.636 19.272
## 3: 5.372 11.688 21.772
## 4: 9.212 20.736 31.992
## 5: 11.252 26.432 38.376
## ---
## 21110: 39.776 7.884 48.748
## 21111: 43.312 8.256 51.632
## 21112: 43.768 7.324 51.172
## 21113: 43.488 6.524 50.376
## 21114: 42.588 5.464 48.780
## uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface
## <num> <num> <num> <num>
## 1: 0.0000 0 227.999 0.00000
## 2: 0.0000 0 227.774 0.00000
## 3: 0.0000 0 227.764 0.00000
## 4: 0.0000 0 228.196 0.00000
## 5: 0.0000 0 228.657 0.00000
## ---
## 21110: 261.6896 0 337.264 101.88800
## 21111: 242.1421 0 337.765 89.93280
## 21112: 214.1965 0 336.652 76.80448
## 21113: 179.0547 0 335.485 62.73920
## 21114: 149.2122 0 333.756 52.28160
## tmp_surface production
## <num> <num>
## 1: 269.220 0.0
## 2: 269.104 0.0
## 3: 269.035 0.0
## 4: 269.001 0.0
## 5: 269.002 3.4
## ---
## 21110: 297.253 NA
## 21111: 294.218 NA
## 21112: 291.334 NA
## 21113: 288.449 NA
## 21114: 287.744 NA
First, we create a daily aggregated series of production data, then a line plot is drawn to visualize the daily total production. After these steps, we plot the autocorrelation function (ACF) to identify patterns and periodicity in the time series data.
production1 <- ts(production$production, freq=365)
daily_ts_multip<-decompose(production1, type="additive")
plot(daily_ts_multip)
Seasonal and trend decomposition separates a time series into three components: trend, seasonal, and residual. The trend component captures the long-term direction, the seasonal component identifies regular repeating patterns, and the residual component represents random noise. This decomposition helps in better understanding and modeling the different factors influencing solar power production, improving the accuracy of our predictive models.
Our approach to forecasting hourly solar power production involves several key steps to prepare and analyze the data, followed by developing predictive models.
Firstly, we convert our cleaned and processed data into a data.table format for efficient manipulation. We then create several lagged variables of the production data, which capture the influence of past production values on current production. These lagged variables range from 1-hour to 96-hour intervals, providing a comprehensive temporal view of past production trends.
In addition to lagged production values, we generate categorical features to capture temporal patterns, such as the hour of the day, the season, and other date-related factors. For instance, we categorize the hour of the day and the quarter of the year (season). We also extract specific components from the datetime field, such as the hour, day, week, and month, to create features like saat, gun, hafta, and ay.
To incorporate weather effects, we calculate the maximum (tmax) and minimum (tmin) daily surface temperatures. We also introduce a trend variable to capture any underlying trends over the period of data collection.
Finally, we create lagged weather variables, such as lagged downward shortwave radiation flux (dswrf_surface), to account for delayed effects of weather conditions on production.
By enriching our dataset with these engineered features, we aim to capture a wide range of factors influencing solar power production, which forms the basis for our predictive modeling.
datapn<-data.table(newdata)
#head(datapn,15)
lag15<-shift(datapn$production, n=15L, fill=NA)
datapn$lag15<-lag15
lag48<-shift(datapn$production, n=48L, fill=NA)
datapn$lag48<-lag48
lag72<-shift(datapn$production, n=72L, fill=NA)
datapn$lag72<-lag72
lag96<-shift(datapn$production, n=96L, fill=NA)
datapn$lag96<-lag96
lag95<-shift(datapn$production, n=95L, fill=NA)
datapn$lag95<-lag95
lag95<-shift(datapn$production, n=95L, fill=NA)
datapn$lag95<-lag95
lag47<-shift(datapn$production, n=47L, fill=NA)
datapn$lag47<-lag47
lag71<-shift(datapn$production, n=71L, fill=NA)
datapn$lag71<-lag71
lag49<-shift(datapn$production, n=49L, fill=NA)
datapn$lag49<-lag49
lag73<-shift(datapn$production, n=73L, fill=NA)
datapn$lag73<-lag73
lag14<-shift(datapn$production, n=14L, fill=NA)
datapn$lag14<-lag14
lag13<-shift(datapn$production, n=13L, fill=NA)
datapn$lag13<-lag13
lag12<-shift(datapn$production, n=12L, fill=NA)
datapn$lag12<-lag12
lag11<-shift(datapn$production, n=11L, fill=NA)
datapn$lag11<-lag11
lag16<-shift(datapn$production, n=16L, fill=NA)
datapn$lag16<-lag16
lag24<-shift(datapn$production, n=24L, fill=NA)
datapn$lag24<-lag24
lag23<-shift(datapn$production, n=23L, fill=NA)
datapn$lag23<-lag23
lag25<-shift(datapn$production, n=25L, fill=NA)
datapn$lag25<-lag25
lag8<-shift(datapn$production, n=8L, fill=NA)
datapn$lag8<-lag8
lag6<-shift(datapn$production, n=6L, fill=NA)
datapn$lag6<-lag6
lag1<-shift(datapn$production, n=1L, fill=NA)
datapn$lag1<-lag1
lag2<-shift(datapn$production, n=2L, fill=NA)
datapn$lag2<-lag2
datapn$hoursoftheday<-as.factor(datapn$hour.x)
datapn$season<-as.factor(quarter(datapn$date.x))
datapn[,saat:=as.character(hour(datetime))]
datapn[,gun:=as.character(day(date.x))]
datapn[,hafta:=as.character(week(date.x))]
datapn[,ay:=as.character(month(date.x))]
datapn[,tmax:=max(tmp_surface),by=date.x]
datapn[,tmin:=min(tmp_surface),by=date.x]
trend<-c(1:nrow((datapn)))
datapn$trend<-trend
lag1dswrf<-shift(datapn$dswrf_surface, n=1L, fill=NA)
datapn$lag1dswrf<-lag1dswrf
lag12dswrf<-shift(datapn$dswrf_surface, n=12L, fill=NA)
datapn$lag12dswrf<-lag12dswrf
After preparing and enriching our dataset with lagged variables and categorical features, we proceed to develop and evaluate multiple linear regression models to predict solar power production. The process begins with simple models and progressively incorporates more variables to enhance predictive accuracy.
For example, the first model (lm0) is a simple linear regression where production is predicted solely based on the downward shortwave radiation flux (dswrf_surface).
We conduct summary and residual analysis to check the models performance for each model individually.
lm0<-lm(production~dswrf_surface,data = datapn)
summary(lm0)
##
## Call:
## lm(formula = production ~ dswrf_surface, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9182 -1.4676 -0.7549 1.2159 9.8480
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.549e-01 2.579e-02 29.27 <2e-16 ***
## dswrf_surface 8.383e-03 7.419e-05 113.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.912 on 20993 degrees of freedom
## (119 observations deleted due to missingness)
## Multiple R-squared: 0.3782, Adjusted R-squared: 0.3782
## F-statistic: 1.277e+04 on 1 and 20993 DF, p-value: < 2.2e-16
checkresiduals(lm0)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 15809, df = 10, p-value < 2.2e-16
#################################################
lm2<-lm(production~dswrf_surface+lag12,data = datapn)
summary(lm2)
##
## Call:
## lm(formula = production ~ dswrf_surface + lag12, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6608 -1.8337 -0.3909 1.1284 8.8525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.834e+00 3.222e-02 56.92 <2e-16 ***
## dswrf_surface 6.819e-03 7.636e-05 89.31 <2e-16 ***
## lag12 -2.864e-01 5.602e-03 -51.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.746 on 20980 degrees of freedom
## (131 observations deleted due to missingness)
## Multiple R-squared: 0.4472, Adjusted R-squared: 0.4472
## F-statistic: 8487 on 2 and 20980 DF, p-value: < 2.2e-16
checkresiduals(lm2)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 15261, df = 10, p-value < 2.2e-16
#############################################
lm3<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer,data = datapn)
summary(lm3)
##
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer,
## data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.429 -1.865 -0.245 1.281 8.992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.592e+00 3.832e-02 67.63 <2e-16 ***
## dswrf_surface 6.189e-03 7.652e-05 80.87 <2e-16 ***
## lag12 -3.367e-01 5.644e-03 -59.66 <2e-16 ***
## tcdc_low.cloud.layer -2.276e-02 6.620e-04 -34.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.672 on 20978 degrees of freedom
## (132 observations deleted due to missingness)
## Multiple R-squared: 0.4767, Adjusted R-squared: 0.4766
## F-statistic: 6370 on 3 and 20978 DF, p-value: < 2.2e-16
checkresiduals(lm3)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 15040, df = 10, p-value < 2.2e-16
############################################
lm4<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer+lag6+lag1,data = datapn)
summary(lm4)
##
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer +
## lag6 + lag1, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7170 -0.7270 -0.2142 0.6585 9.2641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.249e+00 2.197e-02 56.847 < 2e-16 ***
## dswrf_surface -4.525e-04 7.433e-05 -6.088 1.17e-09 ***
## lag12 -9.157e-02 3.135e-03 -29.208 < 2e-16 ***
## tcdc_low.cloud.layer -8.786e-03 3.555e-04 -24.715 < 2e-16 ***
## lag6 -1.749e-01 3.509e-03 -49.842 < 2e-16 ***
## lag1 8.942e-01 5.012e-03 178.406 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.396 on 20976 degrees of freedom
## (132 observations deleted due to missingness)
## Multiple R-squared: 0.8573, Adjusted R-squared: 0.8572
## F-statistic: 2.52e+04 on 5 and 20976 DF, p-value: < 2.2e-16
checkresiduals(lm4)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 2984.9, df = 10, p-value < 2.2e-16
##########################################
lm5<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer+lag1+tmax+tmin+lag6,data = datapn)
summary(lm5)
##
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer +
## lag1 + tmax + tmin + lag6, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5202 -0.7429 -0.1304 0.6851 9.4678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.146e-02 4.411e-01 -0.185 0.853
## dswrf_surface -7.483e-04 7.485e-05 -9.998 < 2e-16 ***
## lag12 -1.078e-01 3.188e-03 -33.818 < 2e-16 ***
## tcdc_low.cloud.layer -3.444e-03 4.293e-04 -8.024 1.08e-15 ***
## lag1 8.947e-01 4.962e-03 180.312 < 2e-16 ***
## tmax 3.895e-02 2.177e-03 17.892 < 2e-16 ***
## tmin -3.689e-02 3.387e-03 -10.890 < 2e-16 ***
## lag6 -1.758e-01 3.475e-03 -50.595 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.38 on 20951 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.8605, Adjusted R-squared: 0.8604
## F-statistic: 1.846e+04 on 7 and 20951 DF, p-value: < 2.2e-16
checkresiduals(lm5)
##
## Breusch-Godfrey test for serial correlation of order up to 11
##
## data: Residuals
## LM test = 5409.6, df = 11, p-value < 2.2e-16
##########################################
lm6<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer+lag1+tmax+hoursoftheday+ay,data = datapn)
summary(lm6)
##
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer +
## lag1 + tmax + hoursoftheday + ay, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.6619 -0.2467 0.0136 0.3726 8.2767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.583e+00 4.986e-01 -5.180 2.24e-07 ***
## dswrf_surface 1.494e-04 8.528e-05 1.752 0.079721 .
## lag12 -5.812e-02 4.752e-03 -12.230 < 2e-16 ***
## tcdc_low.cloud.layer -4.355e-03 3.728e-04 -11.681 < 2e-16 ***
## lag1 7.252e-01 5.162e-03 140.470 < 2e-16 ***
## tmax 1.016e-02 1.759e-03 5.779 7.63e-09 ***
## hoursoftheday1 -2.419e-02 5.358e-02 -0.452 0.651603
## hoursoftheday2 -8.560e-02 5.400e-02 -1.585 0.112948
## hoursoftheday3 -1.915e-01 5.579e-02 -3.432 0.000599 ***
## hoursoftheday4 -3.093e-01 5.948e-02 -5.200 2.01e-07 ***
## hoursoftheday5 -3.280e-01 6.304e-02 -5.204 1.97e-07 ***
## hoursoftheday6 1.583e-01 6.501e-02 2.434 0.014925 *
## hoursoftheday7 1.922e+00 6.496e-02 29.581 < 2e-16 ***
## hoursoftheday8 2.975e+00 6.524e-02 45.594 < 2e-16 ***
## hoursoftheday9 2.911e+00 6.773e-02 42.984 < 2e-16 ***
## hoursoftheday10 2.048e+00 7.225e-02 28.348 < 2e-16 ***
## hoursoftheday11 1.708e+00 7.460e-02 22.892 < 2e-16 ***
## hoursoftheday12 1.540e+00 7.620e-02 20.210 < 2e-16 ***
## hoursoftheday13 1.210e+00 7.719e-02 15.679 < 2e-16 ***
## hoursoftheday14 4.471e-01 7.731e-02 5.783 7.42e-09 ***
## hoursoftheday15 -5.921e-01 7.639e-02 -7.751 9.55e-15 ***
## hoursoftheday16 -1.416e+00 7.055e-02 -20.074 < 2e-16 ***
## hoursoftheday17 -1.367e+00 6.829e-02 -20.010 < 2e-16 ***
## hoursoftheday18 -9.759e-01 6.596e-02 -14.796 < 2e-16 ***
## hoursoftheday19 -3.525e-01 6.097e-02 -5.781 7.54e-09 ***
## hoursoftheday20 -1.435e-01 5.683e-02 -2.525 0.011582 *
## hoursoftheday21 -3.338e-02 5.543e-02 -0.602 0.546962
## hoursoftheday22 4.461e-03 5.353e-02 0.083 0.933581
## hoursoftheday23 7.680e-03 5.353e-02 0.143 0.885933
## ay10 5.993e-02 4.955e-02 1.210 0.226422
## ay11 -1.988e-02 4.174e-02 -0.476 0.633828
## ay12 -2.592e-02 3.832e-02 -0.676 0.498821
## ay2 2.221e-01 3.495e-02 6.357 2.10e-10 ***
## ay3 2.161e-01 3.708e-02 5.827 5.73e-09 ***
## ay4 1.290e-01 4.639e-02 2.782 0.005415 **
## ay5 1.677e-01 4.947e-02 3.389 0.000702 ***
## ay6 1.437e-01 5.707e-02 2.517 0.011833 *
## ay7 2.175e-01 6.448e-02 3.373 0.000746 ***
## ay8 8.470e-02 7.320e-02 1.157 0.247253
## ay9 1.275e-01 6.277e-02 2.032 0.042206 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.119 on 20919 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.9084, Adjusted R-squared: 0.9082
## F-statistic: 5318 on 39 and 20919 DF, p-value: < 2.2e-16
checkresiduals(lm6)
##
## Breusch-Godfrey test for serial correlation of order up to 43
##
## data: Residuals
## LM test = 2257.3, df = 43, p-value < 2.2e-16
#########################################
lm7<-lm(production~log(dswrf_surface+1)+lag12+log(tcdc_low.cloud.layer+1)+lag1+tmax+hoursoftheday,data = datapn)
summary(lm7)
##
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + log(tcdc_low.cloud.layer +
## 1) + lag1 + tmax + hoursoftheday, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5817 -0.2423 0.0149 0.3387 8.2946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1869947 0.2458324 -4.828 1.39e-06 ***
## log(dswrf_surface + 1) 0.2479537 0.0143077 17.330 < 2e-16 ***
## lag12 -0.0467701 0.0045833 -10.204 < 2e-16 ***
## log(tcdc_low.cloud.layer + 1) -0.0684753 0.0057417 -11.926 < 2e-16 ***
## lag1 0.7122651 0.0049087 145.104 < 2e-16 ***
## tmax 0.0056814 0.0008251 6.886 5.91e-12 ***
## hoursoftheday1 -0.0190866 0.0533467 -0.358 0.7205
## hoursoftheday2 -0.0680072 0.0537354 -1.266 0.2057
## hoursoftheday3 -0.1528892 0.0554064 -2.759 0.0058 **
## hoursoftheday4 -0.2565045 0.0589452 -4.352 1.36e-05 ***
## hoursoftheday5 -0.2574354 0.0623012 -4.132 3.61e-05 ***
## hoursoftheday6 0.1592948 0.0642637 2.479 0.0132 *
## hoursoftheday7 1.6671100 0.0666115 25.027 < 2e-16 ***
## hoursoftheday8 2.4512918 0.0717948 34.143 < 2e-16 ***
## hoursoftheday9 2.1147454 0.0808276 26.164 < 2e-16 ***
## hoursoftheday10 0.8128840 0.1003606 8.100 5.81e-16 ***
## hoursoftheday11 0.4468876 0.1022329 4.371 1.24e-05 ***
## hoursoftheday12 0.2597592 0.1034482 2.511 0.0120 *
## hoursoftheday13 -0.0822080 0.1041378 -0.789 0.4299
## hoursoftheday14 -0.8546406 0.1043941 -8.187 2.84e-16 ***
## hoursoftheday15 -1.9059378 0.1044105 -18.254 < 2e-16 ***
## hoursoftheday16 -2.7015715 0.1015898 -26.593 < 2e-16 ***
## hoursoftheday17 -2.6379205 0.1009382 -26.134 < 2e-16 ***
## hoursoftheday18 -2.2205794 0.0991834 -22.389 < 2e-16 ***
## hoursoftheday19 -1.5786783 0.0949048 -16.634 < 2e-16 ***
## hoursoftheday20 -1.3519092 0.0906125 -14.920 < 2e-16 ***
## hoursoftheday21 -1.2211888 0.0881230 -13.858 < 2e-16 ***
## hoursoftheday22 0.0024854 0.0532986 0.047 0.9628
## hoursoftheday23 0.0056090 0.0533007 0.105 0.9162
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.114 on 20930 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.9091, Adjusted R-squared: 0.909
## F-statistic: 7479 on 28 and 20930 DF, p-value: < 2.2e-16
checkresiduals(lm7)
##
## Breusch-Godfrey test for serial correlation of order up to 32
##
## data: Residuals
## LM test = 2108.4, df = 32, p-value < 2.2e-16
########################################
lm8<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+lag1+tmax+hoursoftheday,data = datapn)
summary(lm8)
##
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season +
## log(tcdc_low.cloud.layer + 1) + lag1 + tmax + hoursoftheday,
## data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.569 -0.241 0.014 0.343 8.286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.045279 0.389683 -5.249 1.55e-07 ***
## log(dswrf_surface + 1) 0.245992 0.015345 16.031 < 2e-16 ***
## lag12 -0.048349 0.004633 -10.435 < 2e-16 ***
## season2 -0.104497 0.030571 -3.418 0.000631 ***
## season3 -0.119869 0.043065 -2.783 0.005383 **
## season4 -0.098289 0.025379 -3.873 0.000108 ***
## log(tcdc_low.cloud.layer + 1) -0.067281 0.006038 -11.143 < 2e-16 ***
## lag1 0.710898 0.004918 144.558 < 2e-16 ***
## tmax 0.008866 0.001362 6.511 7.62e-11 ***
## hoursoftheday1 -0.019742 0.053329 -0.370 0.711241
## hoursoftheday2 -0.070340 0.053726 -1.309 0.190469
## hoursoftheday3 -0.158095 0.055432 -2.852 0.004348 **
## hoursoftheday4 -0.265027 0.059053 -4.488 7.23e-06 ***
## hoursoftheday5 -0.268403 0.062477 -4.296 1.75e-05 ***
## hoursoftheday6 0.147833 0.064363 2.297 0.021636 *
## hoursoftheday7 1.658424 0.066689 24.868 < 2e-16 ***
## hoursoftheday8 2.447971 0.072430 33.798 < 2e-16 ***
## hoursoftheday9 2.117520 0.082505 25.665 < 2e-16 ***
## hoursoftheday10 0.822343 0.104184 7.893 3.09e-15 ***
## hoursoftheday11 0.457408 0.106260 4.305 1.68e-05 ***
## hoursoftheday12 0.270546 0.107626 2.514 0.011953 *
## hoursoftheday13 -0.071516 0.108424 -0.660 0.509522
## hoursoftheday14 -0.844527 0.108744 -7.766 8.46e-15 ***
## hoursoftheday15 -1.897353 0.108776 -17.443 < 2e-16 ***
## hoursoftheday16 -2.696327 0.105670 -25.517 < 2e-16 ***
## hoursoftheday17 -2.635966 0.104760 -25.162 < 2e-16 ***
## hoursoftheday18 -2.220310 0.102805 -21.597 < 2e-16 ***
## hoursoftheday19 -1.576474 0.098668 -15.978 < 2e-16 ***
## hoursoftheday20 -1.346029 0.094648 -14.221 < 2e-16 ***
## hoursoftheday21 -1.212698 0.092295 -13.139 < 2e-16 ***
## hoursoftheday22 0.002548 0.053280 0.048 0.961858
## hoursoftheday23 0.005762 0.053282 0.108 0.913885
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.113 on 20927 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.9092, Adjusted R-squared: 0.9091
## F-statistic: 6760 on 31 and 20927 DF, p-value: < 2.2e-16
checkresiduals(lm8)
##
## Breusch-Godfrey test for serial correlation of order up to 35
##
## data: Residuals
## LM test = 2131.4, df = 35, p-value < 2.2e-16
##########################################
lm9<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+tmax+hoursoftheday+trend+lag1,data = datapn)
summary(lm9)
##
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season +
## log(tcdc_low.cloud.layer + 1) + tmax + hoursoftheday + trend +
## lag1, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5650 -0.2418 0.0134 0.3424 8.2762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.137e+00 3.951e-01 -5.410 6.39e-08 ***
## log(dswrf_surface + 1) 2.445e-01 1.538e-02 15.891 < 2e-16 ***
## lag12 -4.866e-02 4.638e-03 -10.490 < 2e-16 ***
## season2 -1.090e-01 3.074e-02 -3.547 0.000390 ***
## season3 -1.305e-01 4.371e-02 -2.985 0.002842 **
## season4 -9.835e-02 2.538e-02 -3.875 0.000107 ***
## log(tcdc_low.cloud.layer + 1) -6.721e-02 6.038e-03 -11.132 < 2e-16 ***
## tmax 9.261e-03 1.390e-03 6.662 2.76e-11 ***
## hoursoftheday1 -1.987e-02 5.333e-02 -0.373 0.709472
## hoursoftheday2 -7.079e-02 5.373e-02 -1.318 0.187632
## hoursoftheday3 -1.591e-01 5.544e-02 -2.870 0.004110 **
## hoursoftheday4 -2.667e-01 5.906e-02 -4.516 6.35e-06 ***
## hoursoftheday5 -2.706e-01 6.249e-02 -4.329 1.50e-05 ***
## hoursoftheday6 1.459e-01 6.438e-02 2.267 0.023398 *
## hoursoftheday7 1.658e+00 6.669e-02 24.866 < 2e-16 ***
## hoursoftheday8 2.450e+00 7.244e-02 33.819 < 2e-16 ***
## hoursoftheday9 2.122e+00 8.256e-02 25.700 < 2e-16 ***
## hoursoftheday10 8.300e-01 1.043e-01 7.956 1.86e-15 ***
## hoursoftheday11 4.654e-01 1.064e-01 4.374 1.23e-05 ***
## hoursoftheday12 2.788e-01 1.078e-01 2.586 0.009704 **
## hoursoftheday13 -6.320e-02 1.086e-01 -0.582 0.560513
## hoursoftheday14 -8.362e-01 1.089e-01 -7.679 1.68e-14 ***
## hoursoftheday15 -1.889e+00 1.089e-01 -17.344 < 2e-16 ***
## hoursoftheday16 -2.689e+00 1.058e-01 -25.416 < 2e-16 ***
## hoursoftheday17 -2.629e+00 1.049e-01 -25.071 < 2e-16 ***
## hoursoftheday18 -2.214e+00 1.029e-01 -21.515 < 2e-16 ***
## hoursoftheday19 -1.570e+00 9.878e-02 -15.893 < 2e-16 ***
## hoursoftheday20 -1.339e+00 9.478e-02 -14.127 < 2e-16 ***
## hoursoftheday21 -1.205e+00 9.244e-02 -13.039 < 2e-16 ***
## hoursoftheday22 2.587e-03 5.328e-02 0.049 0.961275
## hoursoftheday23 5.821e-03 5.328e-02 0.109 0.913002
## trend -1.854e-06 1.313e-06 -1.412 0.157892
## lag1 7.108e-01 4.919e-03 144.508 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.113 on 20926 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.9092, Adjusted R-squared: 0.9091
## F-statistic: 6549 on 32 and 20926 DF, p-value: < 2.2e-16
checkresiduals(lm9)
##
## Breusch-Godfrey test for serial correlation of order up to 36
##
## data: Residuals
## LM test = 2134.5, df = 36, p-value < 2.2e-16
##############################################
lm10<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+tmax+hoursoftheday+trend+lag1,data = datapn)
summary(lm10)
##
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season +
## log(tcdc_low.cloud.layer + 1) + tmax + hoursoftheday + trend +
## lag1, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5650 -0.2418 0.0134 0.3424 8.2762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.137e+00 3.951e-01 -5.410 6.39e-08 ***
## log(dswrf_surface + 1) 2.445e-01 1.538e-02 15.891 < 2e-16 ***
## lag12 -4.866e-02 4.638e-03 -10.490 < 2e-16 ***
## season2 -1.090e-01 3.074e-02 -3.547 0.000390 ***
## season3 -1.305e-01 4.371e-02 -2.985 0.002842 **
## season4 -9.835e-02 2.538e-02 -3.875 0.000107 ***
## log(tcdc_low.cloud.layer + 1) -6.721e-02 6.038e-03 -11.132 < 2e-16 ***
## tmax 9.261e-03 1.390e-03 6.662 2.76e-11 ***
## hoursoftheday1 -1.987e-02 5.333e-02 -0.373 0.709472
## hoursoftheday2 -7.079e-02 5.373e-02 -1.318 0.187632
## hoursoftheday3 -1.591e-01 5.544e-02 -2.870 0.004110 **
## hoursoftheday4 -2.667e-01 5.906e-02 -4.516 6.35e-06 ***
## hoursoftheday5 -2.706e-01 6.249e-02 -4.329 1.50e-05 ***
## hoursoftheday6 1.459e-01 6.438e-02 2.267 0.023398 *
## hoursoftheday7 1.658e+00 6.669e-02 24.866 < 2e-16 ***
## hoursoftheday8 2.450e+00 7.244e-02 33.819 < 2e-16 ***
## hoursoftheday9 2.122e+00 8.256e-02 25.700 < 2e-16 ***
## hoursoftheday10 8.300e-01 1.043e-01 7.956 1.86e-15 ***
## hoursoftheday11 4.654e-01 1.064e-01 4.374 1.23e-05 ***
## hoursoftheday12 2.788e-01 1.078e-01 2.586 0.009704 **
## hoursoftheday13 -6.320e-02 1.086e-01 -0.582 0.560513
## hoursoftheday14 -8.362e-01 1.089e-01 -7.679 1.68e-14 ***
## hoursoftheday15 -1.889e+00 1.089e-01 -17.344 < 2e-16 ***
## hoursoftheday16 -2.689e+00 1.058e-01 -25.416 < 2e-16 ***
## hoursoftheday17 -2.629e+00 1.049e-01 -25.071 < 2e-16 ***
## hoursoftheday18 -2.214e+00 1.029e-01 -21.515 < 2e-16 ***
## hoursoftheday19 -1.570e+00 9.878e-02 -15.893 < 2e-16 ***
## hoursoftheday20 -1.339e+00 9.478e-02 -14.127 < 2e-16 ***
## hoursoftheday21 -1.205e+00 9.244e-02 -13.039 < 2e-16 ***
## hoursoftheday22 2.587e-03 5.328e-02 0.049 0.961275
## hoursoftheday23 5.821e-03 5.328e-02 0.109 0.913002
## trend -1.854e-06 1.313e-06 -1.412 0.157892
## lag1 7.108e-01 4.919e-03 144.508 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.113 on 20926 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.9092, Adjusted R-squared: 0.9091
## F-statistic: 6549 on 32 and 20926 DF, p-value: < 2.2e-16
checkresiduals(lm10)
##
## Breusch-Godfrey test for serial correlation of order up to 36
##
## data: Residuals
## LM test = 2134.5, df = 36, p-value < 2.2e-16
##############################################
lm11<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+tmax+hoursoftheday+trend+lag2+lag1+lag1dswrf,data = datapn)
summary(lm11)
##
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season +
## log(tcdc_low.cloud.layer + 1) + tmax + hoursoftheday + trend +
## lag2 + lag1 + lag1dswrf, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5464 -0.2451 0.0155 0.3504 8.7715
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.333e+00 3.969e-01 -5.879 4.20e-09 ***
## log(dswrf_surface + 1) 2.520e-01 1.564e-02 16.109 < 2e-16 ***
## lag12 -5.193e-02 4.628e-03 -11.222 < 2e-16 ***
## season2 -1.119e-01 3.081e-02 -3.633 0.000281 ***
## season3 -1.395e-01 4.357e-02 -3.202 0.001367 **
## season4 -1.108e-01 2.556e-02 -4.336 1.46e-05 ***
## log(tcdc_low.cloud.layer + 1) -7.517e-02 6.050e-03 -12.424 < 2e-16 ***
## tmax 1.009e-02 1.397e-03 7.221 5.35e-13 ***
## hoursoftheday1 -2.119e-02 5.311e-02 -0.399 0.689831
## hoursoftheday2 -7.552e-02 5.350e-02 -1.411 0.158123
## hoursoftheday3 -1.698e-01 5.521e-02 -3.075 0.002110 **
## hoursoftheday4 -2.853e-01 5.884e-02 -4.849 1.25e-06 ***
## hoursoftheday5 -2.946e-01 6.227e-02 -4.730 2.26e-06 ***
## hoursoftheday6 1.126e-01 6.419e-02 1.754 0.079501 .
## hoursoftheday7 1.585e+00 6.687e-02 23.700 < 2e-16 ***
## hoursoftheday8 2.276e+00 7.395e-02 30.774 < 2e-16 ***
## hoursoftheday9 1.963e+00 8.456e-02 23.211 < 2e-16 ***
## hoursoftheday10 7.726e-01 1.067e-01 7.241 4.62e-13 ***
## hoursoftheday11 5.403e-01 1.061e-01 5.090 3.61e-07 ***
## hoursoftheday12 3.983e-01 1.079e-01 3.691 0.000224 ***
## hoursoftheday13 6.960e-02 1.091e-01 0.638 0.523557
## hoursoftheday14 -6.866e-01 1.099e-01 -6.249 4.21e-10 ***
## hoursoftheday15 -1.706e+00 1.104e-01 -15.451 < 2e-16 ***
## hoursoftheday16 -2.479e+00 1.081e-01 -22.940 < 2e-16 ***
## hoursoftheday17 -2.438e+00 1.059e-01 -23.032 < 2e-16 ***
## hoursoftheday18 -2.115e+00 1.031e-01 -20.516 < 2e-16 ***
## hoursoftheday19 -1.552e+00 9.861e-02 -15.739 < 2e-16 ***
## hoursoftheday20 -1.376e+00 9.452e-02 -14.562 < 2e-16 ***
## hoursoftheday21 -1.242e+00 9.211e-02 -13.483 < 2e-16 ***
## hoursoftheday22 2.633e-03 5.480e-02 0.048 0.961670
## hoursoftheday23 6.222e-03 5.306e-02 0.117 0.906658
## trend -2.079e-06 1.310e-06 -1.587 0.112606
## lag2 -9.106e-02 7.130e-03 -12.771 < 2e-16 ***
## lag1 7.769e-01 6.982e-03 111.268 < 2e-16 ***
## lag1dswrf 7.319e-07 8.251e-05 0.009 0.992922
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.109 on 20924 degrees of freedom
## (155 observations deleted due to missingness)
## Multiple R-squared: 0.91, Adjusted R-squared: 0.9098
## F-statistic: 6221 on 34 and 20924 DF, p-value: < 2.2e-16
checkresiduals(lm11)
##
## Breusch-Godfrey test for serial correlation of order up to 38
##
## data: Residuals
## LM test = 2080, df = 38, p-value < 2.2e-16
##############################################
lm12<-lm(production~dswrf_surface+tmax+tcdc_entire.atmosphere+hoursoftheday+lag1+lag24+lag23+lag25+hafta+ay,data = datapn)
summary(lm12)
##
## Call:
## lm(formula = production ~ dswrf_surface + tmax + tcdc_entire.atmosphere +
## hoursoftheday + lag1 + lag24 + lag23 + lag25 + hafta + ay,
## data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.8056 -0.1944 0.0198 0.2923 8.3910
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.057e+00 4.947e-01 -4.158 3.22e-05 ***
## dswrf_surface 1.927e-04 8.251e-05 2.335 0.019529 *
## tmax 7.489e-03 1.724e-03 4.343 1.41e-05 ***
## tcdc_entire.atmosphere -3.386e-03 2.610e-04 -12.973 < 2e-16 ***
## hoursoftheday1 -1.162e-03 5.140e-02 -0.023 0.981961
## hoursoftheday2 -1.902e-03 5.140e-02 -0.037 0.970480
## hoursoftheday3 -2.810e-03 5.140e-02 -0.055 0.956395
## hoursoftheday4 -6.820e-03 5.140e-02 -0.133 0.894445
## hoursoftheday5 -8.457e-03 5.154e-02 -0.164 0.869665
## hoursoftheday6 2.035e-01 5.378e-02 3.785 0.000154 ***
## hoursoftheday7 1.398e+00 5.789e-02 24.155 < 2e-16 ***
## hoursoftheday8 2.152e+00 6.140e-02 35.046 < 2e-16 ***
## hoursoftheday9 2.145e+00 6.455e-02 33.237 < 2e-16 ***
## hoursoftheday10 1.473e+00 6.994e-02 21.058 < 2e-16 ***
## hoursoftheday11 1.224e+00 7.226e-02 16.933 < 2e-16 ***
## hoursoftheday12 1.127e+00 7.344e-02 15.345 < 2e-16 ***
## hoursoftheday13 9.586e-01 7.352e-02 13.038 < 2e-16 ***
## hoursoftheday14 5.036e-01 7.265e-02 6.931 4.29e-12 ***
## hoursoftheday15 -1.609e-01 7.104e-02 -2.265 0.023534 *
## hoursoftheday16 -7.260e-01 6.380e-02 -11.380 < 2e-16 ***
## hoursoftheday17 -6.895e-01 5.963e-02 -11.563 < 2e-16 ***
## hoursoftheday18 -4.457e-01 5.702e-02 -7.817 5.66e-15 ***
## hoursoftheday19 -6.425e-02 5.537e-02 -1.160 0.245933
## hoursoftheday20 -1.746e-02 5.407e-02 -0.323 0.746766
## hoursoftheday21 -1.292e-02 5.327e-02 -0.243 0.808371
## hoursoftheday22 3.497e-03 5.140e-02 0.068 0.945758
## hoursoftheday23 1.715e-03 5.138e-02 0.033 0.973379
## lag1 6.975e-01 5.250e-03 132.846 < 2e-16 ***
## lag24 1.819e-01 8.513e-03 21.370 < 2e-16 ***
## lag23 1.028e-01 6.615e-03 15.536 < 2e-16 ***
## lag25 -1.436e-01 6.822e-03 -21.054 < 2e-16 ***
## hafta10 -6.158e-02 1.531e-01 -0.402 0.687528
## hafta11 1.610e-03 1.534e-01 0.010 0.991626
## hafta12 4.108e-02 1.530e-01 0.269 0.788294
## hafta13 -2.343e-02 1.545e-01 -0.152 0.879496
## hafta14 -1.733e-01 2.245e-01 -0.772 0.440132
## hafta15 -1.131e-01 2.240e-01 -0.505 0.613539
## hafta16 -1.448e-01 2.251e-01 -0.643 0.520111
## hafta17 -1.732e-01 2.254e-01 -0.768 0.442347
## hafta18 -1.202e-01 2.457e-01 -0.489 0.624639
## hafta19 -1.484e-01 2.564e-01 -0.579 0.562755
## hafta2 4.966e-02 6.881e-02 0.722 0.470478
## hafta20 -1.045e-01 2.567e-01 -0.407 0.684012
## hafta21 -1.840e-01 2.575e-01 -0.715 0.474866
## hafta22 -2.169e-01 2.647e-01 -0.819 0.412545
## hafta23 -2.744e-01 2.852e-01 -0.962 0.335904
## hafta24 -2.985e-01 2.850e-01 -1.047 0.294902
## hafta25 -3.241e-01 2.848e-01 -1.138 0.255169
## hafta26 -2.982e-01 2.860e-01 -1.043 0.297076
## hafta27 -2.354e-01 3.313e-01 -0.711 0.477378
## hafta28 -1.763e-01 3.308e-01 -0.533 0.594179
## hafta29 -2.357e-01 3.314e-01 -0.711 0.476950
## hafta3 3.509e-02 6.876e-02 0.510 0.609803
## hafta30 -2.464e-01 3.320e-01 -0.742 0.457951
## hafta31 -3.358e-01 3.451e-01 -0.973 0.330504
## hafta32 -3.453e-01 3.570e-01 -0.967 0.333367
## hafta33 -3.754e-01 3.571e-01 -1.051 0.293242
## hafta34 -3.537e-01 3.566e-01 -0.992 0.321279
## hafta35 -3.502e-01 3.584e-01 -0.977 0.328456
## hafta36 -2.861e-01 3.783e-01 -0.756 0.449495
## hafta37 -2.526e-01 3.780e-01 -0.668 0.503850
## hafta38 -2.411e-01 3.781e-01 -0.638 0.523704
## hafta39 -2.589e-01 3.775e-01 -0.686 0.492857
## hafta4 -7.709e-02 6.927e-02 -1.113 0.265751
## hafta40 -1.868e-01 1.959e-01 -0.953 0.340459
## hafta41 -1.593e-01 1.951e-01 -0.816 0.414361
## hafta42 -1.933e-01 1.942e-01 -0.995 0.319555
## hafta43 -1.556e-01 1.947e-01 -0.799 0.424320
## hafta44 -1.323e-01 1.623e-01 -0.815 0.414964
## hafta45 -1.528e-01 1.535e-01 -0.996 0.319492
## hafta46 -1.426e-01 1.527e-01 -0.934 0.350387
## hafta47 -8.786e-02 1.516e-01 -0.580 0.562105
## hafta48 5.671e-03 1.207e-01 0.047 0.962537
## hafta49 2.541e-02 7.708e-02 0.330 0.741655
## hafta5 2.341e-02 8.886e-02 0.263 0.792242
## hafta50 3.814e-02 7.721e-02 0.494 0.621293
## hafta51 -8.684e-02 7.659e-02 -1.134 0.256834
## hafta52 -5.098e-02 7.680e-02 -0.664 0.506810
## hafta53 5.969e-02 1.626e-01 0.367 0.713544
## hafta6 1.676e-02 1.190e-01 0.141 0.887974
## hafta7 -1.727e-02 1.190e-01 -0.145 0.884548
## hafta8 -1.862e-02 1.190e-01 -0.156 0.875691
## hafta9 -5.735e-02 1.300e-01 -0.441 0.659042
## ay10 1.250e-01 1.759e-01 0.710 0.477593
## ay11 7.202e-02 1.302e-01 0.553 0.580088
## ay12 NA NA NA NA
## ay2 1.325e-01 9.683e-02 1.368 0.171292
## ay3 1.254e-01 1.363e-01 0.920 0.357524
## ay4 1.801e-01 2.125e-01 0.848 0.396589
## ay5 1.607e-01 2.448e-01 0.657 0.511495
## ay6 2.527e-01 2.719e-01 0.930 0.352635
## ay7 1.249e-01 3.193e-01 0.391 0.695561
## ay8 1.768e-01 3.446e-01 0.513 0.607844
## ay9 1.474e-01 3.682e-01 0.400 0.688971
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.073 on 20852 degrees of freedom
## (169 observations deleted due to missingness)
## Multiple R-squared: 0.9159, Adjusted R-squared: 0.9156
## F-statistic: 2469 on 92 and 20852 DF, p-value: < 2.2e-16
checkresiduals(lm12)
##
## Breusch-Godfrey test for serial correlation of order up to 97
##
## data: Residuals
## LM test = 1354.2, df = 97, p-value < 2.2e-16
##############################################
lm13<-lm(production~dswrf_surface+tmp_surface+tcdc_entire.atmosphere+lag73+lag72+lag71+lag48+lag47+lag49+ay+hoursoftheday+hafta,data = datapn)
summary(lm13)
##
## Call:
## lm(formula = production ~ dswrf_surface + tmp_surface + tcdc_entire.atmosphere +
## lag73 + lag72 + lag71 + lag48 + lag47 + lag49 + ay + hoursoftheday +
## hafta, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1546 -0.4938 0.0334 0.6243 8.1348
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.332e+01 7.362e-01 -18.095 < 2e-16 ***
## dswrf_surface 3.363e-03 1.196e-04 28.124 < 2e-16 ***
## tmp_surface 5.118e-02 2.696e-03 18.983 < 2e-16 ***
## tcdc_entire.atmosphere -1.133e-02 3.547e-04 -31.941 < 2e-16 ***
## lag73 -2.727e-02 9.588e-03 -2.844 0.004456 **
## lag72 1.253e-01 1.197e-02 10.469 < 2e-16 ***
## lag71 2.693e-02 9.548e-03 2.821 0.004794 **
## lag48 1.744e-01 1.197e-02 14.568 < 2e-16 ***
## lag47 4.096e-02 9.545e-03 4.292 1.78e-05 ***
## lag49 -3.237e-02 9.635e-03 -3.359 0.000784 ***
## ay10 8.955e-01 3.333e-01 2.687 0.007214 **
## ay11 5.753e-01 2.898e-01 1.985 0.047150 *
## ay12 4.499e-01 2.267e-01 1.984 0.047218 *
## ay2 5.847e-01 1.342e-01 4.357 1.32e-05 ***
## ay3 5.206e-01 1.890e-01 2.755 0.005870 **
## ay4 8.294e-01 2.947e-01 2.814 0.004894 **
## ay5 9.637e-01 3.396e-01 2.838 0.004543 **
## ay6 1.058e+00 3.771e-01 2.806 0.005016 **
## ay7 7.699e-01 4.430e-01 1.738 0.082249 .
## ay8 8.581e-01 4.781e-01 1.795 0.072719 .
## ay9 7.707e-01 5.109e-01 1.509 0.131388
## hoursoftheday1 1.637e-02 7.134e-02 0.229 0.818550
## hoursoftheday2 3.317e-02 7.136e-02 0.465 0.642119
## hoursoftheday3 4.943e-02 7.140e-02 0.692 0.488769
## hoursoftheday4 6.383e-02 7.145e-02 0.893 0.371687
## hoursoftheday5 9.228e-02 7.183e-02 1.285 0.198882
## hoursoftheday6 3.553e-01 7.660e-02 4.639 3.53e-06 ***
## hoursoftheday7 1.638e+00 8.430e-02 19.431 < 2e-16 ***
## hoursoftheday8 3.255e+00 8.944e-02 36.394 < 2e-16 ***
## hoursoftheday9 4.324e+00 9.185e-02 47.075 < 2e-16 ***
## hoursoftheday10 3.344e+00 1.002e-01 33.373 < 2e-16 ***
## hoursoftheday11 3.042e+00 1.031e-01 29.499 < 2e-16 ***
## hoursoftheday12 2.716e+00 1.046e-01 25.976 < 2e-16 ***
## hoursoftheday13 2.350e+00 1.043e-01 22.530 < 2e-16 ***
## hoursoftheday14 1.690e+00 1.028e-01 16.439 < 2e-16 ***
## hoursoftheday15 6.165e-01 1.006e-01 6.130 8.94e-10 ***
## hoursoftheday16 -1.930e-01 9.007e-02 -2.143 0.032141 *
## hoursoftheday17 -9.460e-01 8.335e-02 -11.350 < 2e-16 ***
## hoursoftheday18 -1.167e+00 7.894e-02 -14.779 < 2e-16 ***
## hoursoftheday19 -9.316e-01 7.645e-02 -12.185 < 2e-16 ***
## hoursoftheday20 -7.046e-01 7.488e-02 -9.410 < 2e-16 ***
## hoursoftheday21 -5.683e-01 7.388e-02 -7.693 1.50e-14 ***
## hoursoftheday22 -3.435e-02 7.137e-02 -0.481 0.630327
## hoursoftheday23 -1.664e-02 7.132e-02 -0.233 0.815535
## hafta10 -6.244e-01 2.131e-01 -2.930 0.003392 **
## hafta11 -3.161e-01 2.133e-01 -1.482 0.138285
## hafta12 -1.739e-01 2.134e-01 -0.815 0.414993
## hafta13 -5.574e-01 2.145e-01 -2.599 0.009349 **
## hafta14 -1.367e+00 3.111e-01 -4.395 1.11e-05 ***
## hafta15 -1.108e+00 3.109e-01 -3.565 0.000365 ***
## hafta16 -1.335e+00 3.116e-01 -4.286 1.82e-05 ***
## hafta17 -1.450e+00 3.118e-01 -4.650 3.34e-06 ***
## hafta18 -1.366e+00 3.403e-01 -4.014 6.00e-05 ***
## hafta19 -1.573e+00 3.546e-01 -4.436 9.22e-06 ***
## hafta2 -1.306e-02 9.814e-02 -0.133 0.894125
## hafta20 -1.451e+00 3.549e-01 -4.088 4.38e-05 ***
## hafta21 -1.696e+00 3.559e-01 -4.767 1.88e-06 ***
## hafta22 -1.854e+00 3.655e-01 -5.071 3.99e-07 ***
## hafta23 -1.891e+00 3.943e-01 -4.797 1.63e-06 ***
## hafta24 -1.997e+00 3.943e-01 -5.066 4.09e-07 ***
## hafta25 -2.130e+00 3.943e-01 -5.401 6.70e-08 ***
## hafta26 -2.041e+00 3.959e-01 -5.155 2.56e-07 ***
## hafta27 -2.012e+00 4.582e-01 -4.391 1.13e-05 ***
## hafta28 -1.760e+00 4.579e-01 -3.843 0.000122 ***
## hafta29 -1.982e+00 4.582e-01 -4.326 1.53e-05 ***
## hafta3 1.372e-01 9.824e-02 1.397 0.162402
## hafta30 -2.004e+00 4.585e-01 -4.372 1.24e-05 ***
## hafta31 -2.261e+00 4.764e-01 -4.745 2.10e-06 ***
## hafta32 -2.241e+00 4.928e-01 -4.547 5.46e-06 ***
## hafta33 -2.319e+00 4.928e-01 -4.706 2.54e-06 ***
## hafta34 -2.177e+00 4.923e-01 -4.421 9.87e-06 ***
## hafta35 -2.135e+00 4.949e-01 -4.313 1.61e-05 ***
## hafta36 -1.867e+00 5.233e-01 -3.568 0.000361 ***
## hafta37 -1.647e+00 5.229e-01 -3.150 0.001634 **
## hafta38 -1.613e+00 5.230e-01 -3.083 0.002050 **
## hafta39 -1.611e+00 5.229e-01 -3.081 0.002063 **
## hafta4 -4.070e-01 9.903e-02 -4.109 3.98e-05 ***
## hafta40 -1.549e+00 3.383e-01 -4.578 4.72e-06 ***
## hafta41 -1.413e+00 3.376e-01 -4.186 2.84e-05 ***
## hafta42 -1.594e+00 3.372e-01 -4.727 2.29e-06 ***
## hafta43 -1.215e+00 3.374e-01 -3.601 0.000317 ***
## hafta44 -1.081e+00 3.025e-01 -3.572 0.000355 ***
## hafta45 -1.148e+00 2.940e-01 -3.906 9.43e-05 ***
## hafta46 -9.236e-01 2.932e-01 -3.150 0.001637 **
## hafta47 -8.388e-01 2.925e-01 -2.868 0.004141 **
## hafta48 -5.179e-01 2.638e-01 -1.963 0.049624 *
## hafta49 -4.836e-01 2.301e-01 -2.102 0.035605 *
## hafta5 -2.655e-02 1.260e-01 -0.211 0.833073
## hafta50 -5.061e-01 2.304e-01 -2.197 0.028048 *
## hafta51 -9.263e-01 2.299e-01 -4.030 5.60e-05 ***
## hafta52 -5.683e-01 2.300e-01 -2.471 0.013497 *
## hafta53 NA NA NA NA
## hafta6 -1.652e-01 1.672e-01 -0.988 0.323254
## hafta7 -2.374e-01 1.672e-01 -1.419 0.155783
## hafta8 -3.513e-01 1.664e-01 -2.111 0.034802 *
## hafta9 -6.465e-01 1.806e-01 -3.579 0.000345 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.489 on 20825 degrees of freedom
## (194 observations deleted due to missingness)
## Multiple R-squared: 0.8384, Adjusted R-squared: 0.8377
## F-statistic: 1149 on 94 and 20825 DF, p-value: < 2.2e-16
checkresiduals(lm13)
##
## Breusch-Godfrey test for serial correlation of order up to 99
##
## data: Residuals
## LM test = 10694, df = 99, p-value < 2.2e-16
As we continue our search, we found that model12 has a great R^2 value which would meet our expactations. But since model12 works on lag1 and our aim is to find 48-hours later production, we construct a very similar model13. This model is the best candidate for our aim.
After evaluating multiple models, we chose Model 13 (lm13) for our final analysis. This decision was based on its superior performance in capturing the key factors influencing solar power production. Model 13 incorporates a comprehensive set of variables, including downward shortwave radiation flux (dswrf_surface), surface temperature (tmp_surface), total cloud cover (tcdc_entire.atmosphere), and several lagged production values (e.g., lag73, lag72, lag71, lag48, lag47, and lag49). Additionally, it includes categorical time features such as the hour of the day (hoursoftheday), month (ay), and week of the year (hafta). This model’s detailed consideration of both immediate and longer-term lagged production values, along with its incorporation of important weather variables, allows it to more accurately capture the complexities of solar power production, making it the most robust and reliable choice for our forecasting needs.
lm13<-lm(production~dswrf_surface+tmp_surface+tcdc_entire.atmosphere+lag73+lag72+lag71+lag48+lag47+lag49+ay+hoursoftheday+hafta,data = datapn)
summary(lm13)
##
## Call:
## lm(formula = production ~ dswrf_surface + tmp_surface + tcdc_entire.atmosphere +
## lag73 + lag72 + lag71 + lag48 + lag47 + lag49 + ay + hoursoftheday +
## hafta, data = datapn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1546 -0.4938 0.0334 0.6243 8.1348
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.332e+01 7.362e-01 -18.095 < 2e-16 ***
## dswrf_surface 3.363e-03 1.196e-04 28.124 < 2e-16 ***
## tmp_surface 5.118e-02 2.696e-03 18.983 < 2e-16 ***
## tcdc_entire.atmosphere -1.133e-02 3.547e-04 -31.941 < 2e-16 ***
## lag73 -2.727e-02 9.588e-03 -2.844 0.004456 **
## lag72 1.253e-01 1.197e-02 10.469 < 2e-16 ***
## lag71 2.693e-02 9.548e-03 2.821 0.004794 **
## lag48 1.744e-01 1.197e-02 14.568 < 2e-16 ***
## lag47 4.096e-02 9.545e-03 4.292 1.78e-05 ***
## lag49 -3.237e-02 9.635e-03 -3.359 0.000784 ***
## ay10 8.955e-01 3.333e-01 2.687 0.007214 **
## ay11 5.753e-01 2.898e-01 1.985 0.047150 *
## ay12 4.499e-01 2.267e-01 1.984 0.047218 *
## ay2 5.847e-01 1.342e-01 4.357 1.32e-05 ***
## ay3 5.206e-01 1.890e-01 2.755 0.005870 **
## ay4 8.294e-01 2.947e-01 2.814 0.004894 **
## ay5 9.637e-01 3.396e-01 2.838 0.004543 **
## ay6 1.058e+00 3.771e-01 2.806 0.005016 **
## ay7 7.699e-01 4.430e-01 1.738 0.082249 .
## ay8 8.581e-01 4.781e-01 1.795 0.072719 .
## ay9 7.707e-01 5.109e-01 1.509 0.131388
## hoursoftheday1 1.637e-02 7.134e-02 0.229 0.818550
## hoursoftheday2 3.317e-02 7.136e-02 0.465 0.642119
## hoursoftheday3 4.943e-02 7.140e-02 0.692 0.488769
## hoursoftheday4 6.383e-02 7.145e-02 0.893 0.371687
## hoursoftheday5 9.228e-02 7.183e-02 1.285 0.198882
## hoursoftheday6 3.553e-01 7.660e-02 4.639 3.53e-06 ***
## hoursoftheday7 1.638e+00 8.430e-02 19.431 < 2e-16 ***
## hoursoftheday8 3.255e+00 8.944e-02 36.394 < 2e-16 ***
## hoursoftheday9 4.324e+00 9.185e-02 47.075 < 2e-16 ***
## hoursoftheday10 3.344e+00 1.002e-01 33.373 < 2e-16 ***
## hoursoftheday11 3.042e+00 1.031e-01 29.499 < 2e-16 ***
## hoursoftheday12 2.716e+00 1.046e-01 25.976 < 2e-16 ***
## hoursoftheday13 2.350e+00 1.043e-01 22.530 < 2e-16 ***
## hoursoftheday14 1.690e+00 1.028e-01 16.439 < 2e-16 ***
## hoursoftheday15 6.165e-01 1.006e-01 6.130 8.94e-10 ***
## hoursoftheday16 -1.930e-01 9.007e-02 -2.143 0.032141 *
## hoursoftheday17 -9.460e-01 8.335e-02 -11.350 < 2e-16 ***
## hoursoftheday18 -1.167e+00 7.894e-02 -14.779 < 2e-16 ***
## hoursoftheday19 -9.316e-01 7.645e-02 -12.185 < 2e-16 ***
## hoursoftheday20 -7.046e-01 7.488e-02 -9.410 < 2e-16 ***
## hoursoftheday21 -5.683e-01 7.388e-02 -7.693 1.50e-14 ***
## hoursoftheday22 -3.435e-02 7.137e-02 -0.481 0.630327
## hoursoftheday23 -1.664e-02 7.132e-02 -0.233 0.815535
## hafta10 -6.244e-01 2.131e-01 -2.930 0.003392 **
## hafta11 -3.161e-01 2.133e-01 -1.482 0.138285
## hafta12 -1.739e-01 2.134e-01 -0.815 0.414993
## hafta13 -5.574e-01 2.145e-01 -2.599 0.009349 **
## hafta14 -1.367e+00 3.111e-01 -4.395 1.11e-05 ***
## hafta15 -1.108e+00 3.109e-01 -3.565 0.000365 ***
## hafta16 -1.335e+00 3.116e-01 -4.286 1.82e-05 ***
## hafta17 -1.450e+00 3.118e-01 -4.650 3.34e-06 ***
## hafta18 -1.366e+00 3.403e-01 -4.014 6.00e-05 ***
## hafta19 -1.573e+00 3.546e-01 -4.436 9.22e-06 ***
## hafta2 -1.306e-02 9.814e-02 -0.133 0.894125
## hafta20 -1.451e+00 3.549e-01 -4.088 4.38e-05 ***
## hafta21 -1.696e+00 3.559e-01 -4.767 1.88e-06 ***
## hafta22 -1.854e+00 3.655e-01 -5.071 3.99e-07 ***
## hafta23 -1.891e+00 3.943e-01 -4.797 1.63e-06 ***
## hafta24 -1.997e+00 3.943e-01 -5.066 4.09e-07 ***
## hafta25 -2.130e+00 3.943e-01 -5.401 6.70e-08 ***
## hafta26 -2.041e+00 3.959e-01 -5.155 2.56e-07 ***
## hafta27 -2.012e+00 4.582e-01 -4.391 1.13e-05 ***
## hafta28 -1.760e+00 4.579e-01 -3.843 0.000122 ***
## hafta29 -1.982e+00 4.582e-01 -4.326 1.53e-05 ***
## hafta3 1.372e-01 9.824e-02 1.397 0.162402
## hafta30 -2.004e+00 4.585e-01 -4.372 1.24e-05 ***
## hafta31 -2.261e+00 4.764e-01 -4.745 2.10e-06 ***
## hafta32 -2.241e+00 4.928e-01 -4.547 5.46e-06 ***
## hafta33 -2.319e+00 4.928e-01 -4.706 2.54e-06 ***
## hafta34 -2.177e+00 4.923e-01 -4.421 9.87e-06 ***
## hafta35 -2.135e+00 4.949e-01 -4.313 1.61e-05 ***
## hafta36 -1.867e+00 5.233e-01 -3.568 0.000361 ***
## hafta37 -1.647e+00 5.229e-01 -3.150 0.001634 **
## hafta38 -1.613e+00 5.230e-01 -3.083 0.002050 **
## hafta39 -1.611e+00 5.229e-01 -3.081 0.002063 **
## hafta4 -4.070e-01 9.903e-02 -4.109 3.98e-05 ***
## hafta40 -1.549e+00 3.383e-01 -4.578 4.72e-06 ***
## hafta41 -1.413e+00 3.376e-01 -4.186 2.84e-05 ***
## hafta42 -1.594e+00 3.372e-01 -4.727 2.29e-06 ***
## hafta43 -1.215e+00 3.374e-01 -3.601 0.000317 ***
## hafta44 -1.081e+00 3.025e-01 -3.572 0.000355 ***
## hafta45 -1.148e+00 2.940e-01 -3.906 9.43e-05 ***
## hafta46 -9.236e-01 2.932e-01 -3.150 0.001637 **
## hafta47 -8.388e-01 2.925e-01 -2.868 0.004141 **
## hafta48 -5.179e-01 2.638e-01 -1.963 0.049624 *
## hafta49 -4.836e-01 2.301e-01 -2.102 0.035605 *
## hafta5 -2.655e-02 1.260e-01 -0.211 0.833073
## hafta50 -5.061e-01 2.304e-01 -2.197 0.028048 *
## hafta51 -9.263e-01 2.299e-01 -4.030 5.60e-05 ***
## hafta52 -5.683e-01 2.300e-01 -2.471 0.013497 *
## hafta53 NA NA NA NA
## hafta6 -1.652e-01 1.672e-01 -0.988 0.323254
## hafta7 -2.374e-01 1.672e-01 -1.419 0.155783
## hafta8 -3.513e-01 1.664e-01 -2.111 0.034802 *
## hafta9 -6.465e-01 1.806e-01 -3.579 0.000345 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.489 on 20825 degrees of freedom
## (194 observations deleted due to missingness)
## Multiple R-squared: 0.8384, Adjusted R-squared: 0.8377
## F-statistic: 1149 on 94 and 20825 DF, p-value: < 2.2e-16
#checkresiduals(lm13)
tmp=copy(datapn)
tmp=tmp[tmp$date.x >="2024-05-14",]
tmp[,actual:=production]
tmp[,predicted_trend:=predict(lm12,tmp)]
tmp[,residual_trend:=actual-predicted_trend]
tmp[,hour.y:=hour.x]
tmp[,date.y:=date.x]
tmp
## Key: <datetime>
## datetime date.x hour.x dswrf_surface tcdc_low.cloud.layer
## <POSc> <IDat> <int> <num> <num>
## 1: 2024-05-14 00:00:00 2024-05-14 0 0.0000 27.748
## 2: 2024-05-14 01:00:00 2024-05-14 1 0.0000 23.612
## 3: 2024-05-14 02:00:00 2024-05-14 2 0.0000 22.176
## 4: 2024-05-14 03:00:00 2024-05-14 3 0.0000 25.244
## 5: 2024-05-14 04:00:00 2024-05-14 4 0.0000 66.244
## ---
## 378: 2024-05-29 17:00:00 2024-05-29 17 557.5424 12.980
## 379: 2024-05-29 18:00:00 2024-05-29 18 475.3048 12.872
## 380: 2024-05-29 19:00:00 2024-05-29 19 394.9286 11.532
## 381: 2024-05-29 20:00:00 2024-05-29 20 321.1014 10.816
## 382: 2024-05-29 21:00:00 2024-05-29 21 267.5859 11.804
## tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
## <num> <num> <num>
## 1: 19.448 0.000 33.540
## 2: 16.940 0.000 29.796
## 3: 15.980 0.000 28.896
## 4: 15.400 0.000 31.368
## 5: 6.440 0.000 67.796
## ---
## 378: 39.776 7.884 48.748
## 379: 43.312 8.256 51.632
## 380: 43.768 7.324 51.172
## 381: 43.488 6.524 50.376
## 382: 42.588 5.464 48.780
## uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface
## <num> <num> <num> <num>
## 1: 0.0000 0.04 263.859 0.00000
## 2: 0.0000 0.04 261.819 0.00000
## 3: 0.0000 0.04 260.644 0.00000
## 4: 0.0000 0.04 262.063 0.00000
## 5: 0.0000 0.12 289.007 0.00000
## ---
## 378: 261.6896 0.00 337.264 101.88800
## 379: 242.1421 0.00 337.765 89.93280
## 380: 214.1965 0.00 336.652 76.80448
## 381: 179.0547 0.00 335.485 62.73920
## 382: 149.2122 0.00 333.756 52.28160
## tmp_surface production lag15 lag48 lag72 lag96 lag95 lag47 lag71 lag49
## <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
## 1: 278.0520 0.00 8.88 0 0.00 0.00 0.00 0 0.00 0
## 2: 277.6708 0.00 7.85 0 0.00 0.00 0.00 0 0.00 0
## 3: 277.3680 0.00 8.36 0 0.00 0.00 0.00 0 0.00 0
## 4: 277.5830 0.00 5.34 0 0.00 0.00 0.05 0 0.03 0
## 5: 277.9880 0.07 4.04 0 0.03 0.05 0.63 0 0.64 0
## ---
## 378: 297.2530 NA NA NA NA NA NA NA NA NA
## 379: 294.2180 NA NA NA NA NA NA NA NA NA
## 380: 291.3340 NA NA NA NA NA NA NA NA NA
## 381: 288.4490 NA NA NA NA NA NA NA NA NA
## 382: 287.7440 NA NA NA NA NA NA NA NA NA
## lag73 lag14 lag13 lag12 lag11 lag16 lag24 lag23 lag25 lag8 lag6 lag1
## <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
## 1: 0 7.85 8.36 5.34 4.04 9.24 0.00 0.00 0 3.62 0.16 0
## 2: 0 8.36 5.34 4.04 5.89 8.88 0.00 0.00 0 1.30 0.00 0
## 3: 0 5.34 4.04 5.89 5.32 7.85 0.00 0.00 0 0.16 0.00 0
## 4: 0 4.04 5.89 5.32 3.62 8.36 0.00 0.06 0 0.00 0.00 0
## 5: 0 5.89 5.32 3.62 1.30 5.34 0.06 0.76 0 0.00 0.00 0
## ---
## 378: NA NA NA NA NA NA NA NA NA NA NA NA
## 379: NA NA NA NA NA NA NA NA NA NA NA NA
## 380: NA NA NA NA NA NA NA NA NA NA NA NA
## 381: NA NA NA NA NA NA NA NA NA NA NA NA
## 382: NA NA NA NA NA NA NA NA NA NA NA NA
## lag2 hoursoftheday season saat gun hafta ay tmax tmin
## <num> <fctr> <fctr> <char> <char> <char> <char> <num> <num>
## 1: 0 0 2 0 14 20 5 296.812 277.368
## 2: 0 1 2 1 14 20 5 296.812 277.368
## 3: 0 2 2 2 14 20 5 296.812 277.368
## 4: 0 3 2 3 14 20 5 296.812 277.368
## 5: 0 4 2 4 14 20 5 296.812 277.368
## ---
## 378: NA 17 2 17 29 22 5 304.153 282.081
## 379: NA 18 2 18 29 22 5 304.153 282.081
## 380: NA 19 2 19 29 22 5 304.153 282.081
## 381: NA 20 2 20 29 22 5 304.153 282.081
## 382: NA 21 2 21 29 22 5 304.153 282.081
## trend lag1dswrf lag12dswrf actual predicted_trend residual_trend hour.y
## <int> <num> <num> <num> <num> <num> <int>
## 1: 20733 0.0000 498.22800 0.00 0.10846081 -0.108460815 0
## 2: 20734 0.0000 536.99040 0.00 0.11997481 -0.119974809 1
## 3: 20735 0.0000 529.10560 0.00 0.12228199 -0.122281990 2
## 4: 20736 0.0000 505.62432 0.00 0.11917042 -0.119170419 3
## 5: 20737 0.0000 356.10880 0.07 0.07468105 -0.004681054 4
## ---
## 378: 21110 628.2720 0.00000 NA NA NA 17
## 379: 21111 557.5424 4.54800 NA NA NA 18
## 380: 21112 475.3048 38.40064 NA NA NA 19
## 381: 21113 394.9286 96.83520 NA NA NA 20
## 382: 21114 321.1014 169.20064 NA NA NA 21
## date.y
## <IDat>
## 1: 2024-05-14
## 2: 2024-05-14
## 3: 2024-05-14
## 4: 2024-05-14
## 5: 2024-05-14
## ---
## 378: 2024-05-29
## 379: 2024-05-29
## 380: 2024-05-29
## 381: 2024-05-29
## 382: 2024-05-29
# Assuming 'tmp2' contains the 'actual' and 'predicted_trend' columns
ggplot(tmp, aes(x=datetime)) +
geom_line(aes(y=actual, color="Actual")) +
geom_line(aes(y=predicted_trend, color="Predicted")) +
labs(title = "Actual vs Predicted Production",
subtitle = paste("Forecast from", min(tmp$date.x), "to", max(tmp$date.x)),
x = "Date",
y = "Production") +
theme_minimal() +
scale_color_manual(values = c("Actual" = "blue", "Predicted" = "red"))
## Warning: Removed 118 rows containing missing values (`geom_line()`).
## Warning: Removed 117 rows containing missing values (`geom_line()`).
Our analysis and modeling efforts have demonstrated the effectiveness of using a combination of weather variables and historical production data to forecast hourly solar power production at the Edikli GES solar power plant. By iteratively building and refining multiple linear regression models, we identified Model 13 as the most accurate and robust predictor. This model incorporates a diverse set of features, including surface temperature, total cloud cover, and various lagged production values, capturing both short-term and long-term dependencies in the data. The inclusion of categorical time features further enhanced the model’s ability to account for daily, weekly, and monthly patterns in solar power production.
Our approach highlighted the importance of feature engineering in improving model performance. The creation of lagged variables and the inclusion of detailed weather data were crucial in capturing the temporal and environmental factors influencing solar power production. Additionally, the iterative model-building process allowed us to systematically evaluate and incorporate the most significant predictors, leading to a highly accurate forecasting model.
Lets check the WMAPE value of our model.
calculate_wmape <- function(actual, predicted) {
sum_abs_errors <- sum(abs(actual - predicted), na.rm = TRUE)
total_actual <- sum(actual, na.rm = TRUE)
if (total_actual == 0) {
return(NA)
} else {
wmape <- sum_abs_errors / total_actual
return(wmape)
}
}
wmape_value <- calculate_wmape(tmp$actual, tmp$predicted_trend)
print(paste("The WMAPE value is:", wmape_value))
## [1] "The WMAPE value is: 0.199644696554339"
This is acceptable but not a great WMAPE value, there are several potential improvements that could further enhance its accuracy and robustness.
One of those improvements could be exploring non-linear models such as regression trees, random forests or so. These approaches could capture more complex relationships in the data that linear regression models may miss.
Although we use many weather variables, there still may be more detailed weather data such as wind speed, humidity, and etc. Those could provide a more comprehensive understanding of the factors affecting solar power production.
By pursuing these extensions, we can continue to refine our forecasting model and enhance its ability to accurately predict solar power production, ultimately contributing to more effective energy management and planning.
Our main code is also in GitHub page.
We also attempted ARIMA models, which unfortunately did not succeed, but you can find on GitHub page.